58 research outputs found
Predicting software faults in large space systems using machine learning techniques
Recently, the use of machine learning (ML) algorithms has proven to be of great practical value in solving a variety of engineering problems including the prediction of failure, fault, and defect-proneness as the space system software becomes complex. One of the most active areas of recent research in ML has been the use of ensemble classifiers. How ML techniques (or classifiers) could be used to predict software faults in space systems, including many aerospace systems is shown, and further use ensemble individual classifiers by having them vote for the most popular class to improve system software fault-proneness prediction. Benchmarking results on four NASA public datasets show the Naive Bayes classifier as more robust software fault prediction while most ensembles with a decision tree classifier as one of its components achieve higher accuracy rates
Handling Out-of-Sequence Data: Kalman Filter Methods or Statistical Imputation?
The issue of handling sensor measurements data over single and multiple lag delays also known as outof-sequence measurement (OOSM) has been considered. It is argued that this problem can also be addressed using model-based imputation strategies and their application in comparison to Kalman filter (KF)-based approaches for a multi-sensor tracking prediction problem has also been demonstrated. The effectiveness of two model-based imputation procedures against five OOSM methods was investigated in Monte Carlo simulation experiments. The delayed measurements were either incorporated (or fused) at the time these were finally available (using OOSM methods) or imputed in a random way with higher probability of delays for multiple lags and lower probability of delays for a single lag (using single or multiple imputation). For single lag, estimates of target tracking computed from the observed data and those based on a data set in which the delayed measurements were imputed were equally unbiased; however, the KF estimates obtained using the Bayesian framework (BF-KF) were more precise. When the measurements were delayed in a multiple lag fashion, there were significant differences in bias or precision between multiple imputation (MI) and OOSM methods, with the former exhibiting a superior performance at nearly all levels of probability of measurement delay and range of manoeuvring indices. Researchers working on sensor data are encouraged to take advantage of software to implement delayed measurements using MI, as estimates of tracking are more precise and less biased in the presence of delayed multi-sensor data than those derived from an observed data analysis approach.Defence Science Journal, 2010, 60(1), pp.87-99, DOI:http://dx.doi.org/10.14429/dsj.60.11
Recommended from our members
Effective techniques for handling incomplete data using decision trees
Decision Trees (DTs) have been recognized as one of the most successful formalisms for knowledge representation and reasoning and are currently applied to a variety of data mining or knowledge discovery applications, particularly for classification problems. There are several efficient methods to learn a DT from data. However, these methods are often limited to the assumption that data are complete.
In this thesis, some contributions to the field of machine learning and statistics that solve the problem of extracting DTs for learning and classification tasks from incomplete databases are presented. The methodology underlying the thesis blends together well-established statistical theories with the most advanced techniques for machine learning and automated reasoning with uncertainty.
The first contribution is the extensive simulations which study the impact of missing data on predictive accuracy of existing DTs which can cope with missing values, when missing values are in both the training and test sets or when they are in either of the two sets. All simulations are performed under missing completely at random, missing at random and informatively missing mechanisms and for different missing data patterns and proportions.
The proposal of a simple, novel, yet effective proposed procedure for training and testing using decision trees in the presence of missing data is the next contribution. Original and simple splitting criteria for attribute selection in tree building are put forward. The proposed technique is evaluated and validated in empirical tests over many real world application domains. In this work, the proposed algorithm maintains (sometimes exceeds) the outstanding accuracy of multiple imputation, especially on datasets containing mixed attributes and purely nominal attributes. Also, the proposed algorithm greatly improves in accuracy for IM data. Another major advantage of this method over multiple imputation is the important saving in computational resources due to it simplicity.
The next contribution is the proposal of three versions of simple probabilistic techniques that could be used for classifying incomplete vectors using decision trees based on complete data. The proposed procedure is superficially similar to that of fractional cases but more effective. The experimental results demonstrate that these approaches can achieve comparative quality to sophisticated algorithms like multiple imputation and therefore are applicable to all kinds of datasets.
Finally, novel uses of two proposed ensemble procedures for handling incomplete training and test data are proposed and discussed. The algorithms combine the two best approaches either with resampling (REMIMIA) or without resampling (EMIMIA) of the training data before growing the decision trees. Experiments are used to evaluate and validate the success of the proposed ensemble methods with respect to individual missing data techniques in the form of empirical tests. EMIMIA attains the highest overall level of prediction accuracy
Leakage current minimisation and power reduction techniques using sub-threshold design
Abstract: Low power IC solutions are in great demand with the rapid advancement of handheld devices, wearables, smart cards and radio frequency identification bringing a massive amount of new products to market that all have the same primary need: Powering the device as long as possible between the need to re- charge the batteries while at the same time dramatically decreasing the device leakage currents. The use of sub-threshold techniques can be a powerful way to create circuits that consume dramatically less energy than those built using standard design practices. In this research, a SOI device was built to compare their electrical characteristics using Silvaco software. The comparisons were focus! ed on three main electrical characteristics that are threshold voltage, sub-threshold voltage and leakage current. It was found that SOI devices are ideal candidates for low power operation
Ensemble missing data techniques for software effort prediction
Constructing an accurate effort prediction model is a challenge in software engineering. The development and validation of models that are used for prediction tasks require good quality data. Unfortunately, software engineering datasets tend to suffer from the incompleteness which could result to inaccurate decision making and project management and implementation. Recently, the use of machine learning algorithms has proven to be of great practical value in solving a variety of software engineering problems including software prediction, including the use of ensemble (combining) classifiers. Research indicates that ensemble individual classifiers lead to a significant improvement in classification performance by having them vote for the most popular class. This paper proposes a method for improving software effort prediction accuracy produced by a decision tree learning algorithm and by generating the ensemble using two imputation methods as elements. Benchmarking results on ten industrial datasets show that the proposed ensemble strategy has the potential to improve prediction accuracy compared to an individual imputation method, especially if multiple imputation is a component of the ensemble
Improving the performance of the Rpper in insurance risk classification : a comparative study using feature selection
The Ripper algorithm is designed to generate rule sets for large datasets with many features. However, it was shown that the algorithm struggles with classification performance in the presence of missing data. The algorithm struggles to classify instances when the quality of the data deteriorates as a result of increasing missing data. In this paper, a feature selection technique is used to help improve the classification performance of the Ripper model. Principal component analysis and evidence automatic relevance determination techniques are used to improve the performance. A comparison is done to see which technique helps the algorithm improve the most. Training datasets with completely observable data were used to construct the model and testing datasets with missing values were used for measuring accuracy. The results showed that principal component analysis is a better feature selection for the Ripper in improving the classification performance
Improving single classifiers prediction accuracy for underground water pump station in a gold mine using ensemble techniques
Abstract: In this paper six single classifiers (support vector machine, artificial neural network, naïve Bayesian classifier, decision trees, radial basis function and k nearest neighbors) were utilized to predict water dam levels in a deep gold mine underground pump station. Also, Bagging and Boosting ensemble techniques were used to increase the prediction accuracy of the single classifiers. In order to enhance the prediction accuracy even more a mutual information ensemble approach is introduced to improve the single classifiers and the Bagging and Boosting prediction results. This ensemble is used to classify, thus monitoring and predicting the underground water dam levels on a single-pump station deep gold mine in South Africa, Mutual information theory is used in order to determine the classifiers optimum number to build the most accurate ensemble. In terms of prediction accuracy, the results show that the mutual information ensemble over performed the other used ensembles and single classifiers and is more efficient for classification of underground water dam levels. However the ensemble construction is more complicated than the Bagging and Boosting techniques
Simulation and parameter optimization of polysilicon gate biaxial strained silicon MOSFETs
Abstract: Although cryptography constitutes a considerable part of the overall security architecture for several use cases in embedded systems, cryptographic devices are still vulnerable to the diversity types of side channel attacks. Improvement in performance of Strained Silicon MOSFETs utilizing conventional device scaling has become more complex, because of the amount of physical limitations associated with the device miniaturization. Therefore, a great deal of attention has recently been paid to the mobility improvement technology through applying strain to CMOS channels. This paper reviews the characteristics of strained-Si CMOS with an emphasis on the mechanism of mobility enhancement due to strain. The device physics for improving the performance of MOSFETs is studied from the viewpoint of electronic states of carriers in inversion layers and, in particular, the sub-band structures. In addition, design and simulation of biaxial strained silicon NMOSFET (n-channel) is done using Silvaco’s Athena/Atlas simulator. From the results obtained, it became clear that biaxial strained silicon NMOS is one of the best alternatives to the current conventional MOSFET
Predicting engineering student success using machine learning
Abstract: Recent years have seen an increase in the number of students from diverse backgrounds enrolling into South African universities, presenting many challenges. Some students struggle with their academic choices, and universities struggle to understand and address the individual needs of such a diverse student base. Fortunately, vast amounts of student information have been collected and stored, giving an opportunity for researchers in educational data mining to derive some useful insights from this data to help both the universities and students. This research aims to identify factors that contribute to the success and or failure of a student, then predict the future performance of the student at enrolment. By using data pre-processing techniques, the experiments identify the most significant success factors from the data at enrolment time. The most significant factors can then be used to identify students who may need extra support, and the nature of those factors can help determine the manner of support needed. This study implemented and evaluated the effectiveness of the most commonly used and new machine learning algorithms in predicting student performance on a sample of 1366 engineering students. The results show various degrees of success in predicting student performance, and it is hoped that these findings will guide the selection of machine learning algorithms for future studies
- …